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1  Introduction 
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The  problem  of  the  extraction  of  the  optical  Sow  has,  in  recent  years,  been  treated  from  a  new  point 
of  view,  that  is,  through  the  use  of  space-time  filters  [7 ,4,6,5].  The  basic  idea  behind  this  method 
is  to  extract  the  optical  Sow  without  having  to  perform  any  type  of  operation  other  than  to  use  a 
collection  of  filters  which  are  tuned  to  different  orientations  in  space-time  (or  equivalently  in  the 
frequency  domain).  Also,  given  that  the  outputs  of  these  filters  have  been  computed,  it  is  necessary 
to  establish  a  method  by  which  we  can  determine  the  value  of  the  estimated  optical  Sow,  because 
the  output  of  a  filter  tuned  to  a  specific  orientation  (even  if  with  maximal  response)  is  not  enough  to 
extract  the  optical  flow  and  we  have  to  use  a  complete  set  of  filters  (in  the  sense  that  it  takes  into 
account  all  possible  orientations).  In  space-time  filtering,  we  convolve  a  sequence  of  images  with  a 
(space-time)  filter,  such  that  the  interval  between  sucessive  images  is  small.  The  minimum  temporal 
interval  between  sucessive  images  is  basically  dictated  by  practical  considerations,  because  if  it  is  too 
small  we  get  little  amount  of  information  about  the  moving  partem  from  frame  to  frame.  On  the  other 
hand,  we  would  like  to  know  what  the  value  of  the  maximum  temporal  interval  between  sucessive 
images  should  be  such  that  we  continue  to  be  able  to  use  a  filtering  approach  to  the  extraction  of 
optical  flow 

The  answer  to  this  question  comes  by  considering  the  sampling  issues  involved  in  this  filtering 
process.  As  I  will  show  in  section  3,  if  there  exists  a  certain  degree  of  motion  uncertainty,  then 
the  maximum  sampling  interval,  is  fixed  by  this  motion  uncertainty.  This  means  that  there  exists  a 
(non-linear)  relationship  between  the  motion  uncertainty  and  the  maximum  sampling  interval. 

The  procedure  of  using  a  collection  of  filters  to  extract  optical  flow  corresponds,  in  a  general 
sense,  to  a  signal  processing  approach,  which  is  mainly  concerned  with  the  extraction  of  information 
about  the  original  signal,  in  the  presence  of  noise  It  involves  the  construction  of  filters,  if  possible 
optimal  ones,  parameter  estimation  and  the  analysis  of  sampling  issues. 

On  the  other  band,  in  the  feature  based  approach  to  the  extraction  of  optical  flow  [3]  it  is  necessary 
to,  previously  to  the  actual  com  nutation  of  of  the  optic?'  flow,  extract  edges  izero  crossing) 
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have  to  be  matched  in  sucessive  frames.  If  the  temporal  interval  between  succesive  images  is  large 
and  the  number  of  edges  to  be  matched  between  frames  is  not  high,  then  a  contour-based  approach 
can  be  sucessful  in  extracting  the  optical  flow.  But,  an  the  other  hanu,  if  the  number  of  edges  is 
large,  the  «mmmt  of  mismatcfaings  can  lead  to  a  high  rate  of  error,  and  as  a  consequence  of  this  to  a 
wrong  estimation  of  the  optical  flow. 

It  is  therefore  important  to  be  able  to  detect  moving  features  and  extract  their  optical  flow  in 
the  presence  of  noisy  data  and  imprecise  measurements.  Depending  on  the  spatial  complexity  of 
information  available  at  each  image,  in  the  temporal  sequence  of  images,  it  can  be  more  reliable  to 
use  a  filtering  approach,  especially  for  the  case  in  which  the  interval  between  these  images  is  small 
and  there  exits  a  high  spatial  content  of  information  (which  makes  a  feature  matching  approach  highly 
unstable). 

In  this  paper  we  discuss,  in  section  2,  the  issue  of  extracting  the  optical  flow  through  feature 
or  intensity  based  approaches  versus  space-time  filtering,  and  present  the  space-time  DOG  cascade 
as  an  energy  filter.  In  section  3  we  analyse  sampling  issues  which  apply  for  uniformly  translating 
patterns  in  the  presence  of  noise  (motion  uncertainty).  Finally,  we  draw  conclusions  in  section  4, 
and  make  an  analogy  between  the  long  and  short-range  processes  of  motion  extraction  in  the  human 
visual  system  and  the  feature-based  and  space-time  filtering  methods  in  Computer  Vision. 


2  Space-time  filtering 

2.1  Extraction  of  the  optical  flow  in  intensity  and  feature-based  approach 

The  extraction  of  the  optical  flow  field  from  the  intensity  variations  in  the  image  plane  has  been  treated 
until  very  recently,  in  Computer  Vision,  as  a  feature  or  intensity-based  problem.  In  the  feature-based 
approach  we  have  to  detect  relevant  features,  such  as  edges,  from  a  pair  of  sucessive  images  (in 
a  temporal  sequence  of  images),  and  afterwards  perform  a  matching  of  corresponding  elements,  so 


2  SPACE-TIME  FILTERING 


3 


that,  as  a  result  of  this  procedure,  we  assign  a  specific  value  of  the  optical  flow  to  the  corresponding 
elements. 

This  method  has  to  overcome  two  major  problems: 

1.  The  correspondence  problem 

2.  The  aperture  problem. 

The  correspondence  problem  [1]  addresses  the  question  of  how  to  assign  the  same  identity  for 
elements  which  appear  in  temporal  sucessicn  of  images.  The  correspondence,  or  matching,  can  be 
computed  in  different  ways,  depending,  in  part,  on  the  temporal  interval  between  succesive  images. 
If  this  interval  is  small,  the  correspondence  between  features  can  be  performed  through  a  set  of  local 
operations  over  elements  which  are  spatially  close  to  each  other.  One  of  these  operations  [1.17] 
consists  in  the  minimization  of  the  distance  a  set  of  elements  takes  to  travel  from  one  image  to  its 
sucessive  one.  On  the  other  hand,  if  this  temporal  interval  is  large,  it  is  more  likely  that  a  more 
global  type  of  operation  for  the  matching  of  features  has  to  be  implemented.  In  general,  the  matching 
of  corresponding  elements  in  sucessive  images  can  be  unstable,  due  to  noise  in  the  image,  and  also 
computationally  expensive  if  the  number  of  features  to  be  matched  is  large. 

In  respect  to  the  aperture  problem,  which  states  that  it  is  not  possible  to  measure  both  components 
of  the  optical  flow  field  given  a  small  aperture  in  the  image,  we  have  to  introduce  additional  contraints 
into  the  model  describing  the  extraction  of  optical  flow,  so  as  to  make  it  possible  to  obtain  the 
full  optical  flow  field.  Actually,  given  a  small  aperture,  we  are  only  able  to  measure  the  normal 
component  (to  the  gradient  of  the  intensity)  of  optical  flow  field,  while  its  tangential  component 
remains  undetermined.  As  one  example  of  the  solution  to  the  aperture  problem,  we  can  mention  the 
area-based  [2]  formulation  which  assumes  the  use  of  a  smoothness  term,  in  addition  to  the  intensity 
continuity  equation,  represented  by  the  sum  of  the  squares  of  the  spatial  derivatives  of  the  optical  flow 
field  components.  Another  example  is  given  by  the  contour-based  [3]  approach,  where  the  contraint 
is  represented  by  the  gradient  in  respect  to  the  arc  length  along  the  intensity  gradient  of  the  optical 


2  SPACE-TIME  FILTERING  4 

flew  field,  in  addition  to  the  difference  between  the  normal  component  of  the  optical  flow  and  its 
measured  value. 


Both,  the  correspondence  and  the  aperture  problem,  involve  in  practice  a  certain  amount  of 
arbitrariness  in  term*  of  having  to  choose  a  set  of  constraints  which  enable  us  to  extract  the  full 
optical  flow.  It  is  therefore  desirable  to  be  able  to  eliminate  the  necessity  of  having  to  cope  with  both 
of  these  problems.  The  method  of  space-time  filtering  does  this,  in  part,  by  eliminating  altogether  the 
necessity  of  the  use  of  the  correspondence  problem.  In  respect  to  the  aperture  problem  the  solution 
given  by  Heeger  [5]  consists  in  modeling  the  image  flow  as  (locally)  purely  translational,  so  that 
the  optical  flow  is  extracted  by  fitting  a  plane  to  the  energy  of  the  filter.  This  is  equivalent  to  the 
computation  of  both  components  of  the  optical  flow  field,  because  for  translational  motion  the  support 
in  the  frequency  domain  is  given  by  a  plane  whose  orientation  is  a  function  of  the  velocity  vector. 

22  Space-time  oriented  filters 

Space-time  filtering  consists,  basically,  in  the  convolution  of  a  temporal  sequence  of  images  (closely 
displaced)  with  a  (space-time)  filter.  The  most  important  aspect  of  space-time  filtering  lies  in  the  fact 
that,  if  we  consider  an  uniformly  translating  partem,  we  are  able  to  select  a  specific  velocity  by  using 
(space-time)  oriented  filters  [6], 

Let  us  take  the  example  of  one-dimensional  motion  (in  the  x  direction).  If  we  analyse  the  picture 
which  is  generated  in  space-time  by  an  uniformly  translating  pattern  (through  a  cross-section  parallel 
the  x-t  plane),  then  we  can  conclude  that  the  orientation  of  the  individual  elements  (like  lines  or 
stripes)  is  intrinsically  determined  by  the  velocity  of  the  pattern  (the  slope  of  a  line  in  the  EPI  plane 
is  equal  to  the  velocity  of  the  feature  associated  to  it).  A  very  interesting  example  of  this  kind  of 
relationship  between  (space-time)  orientation  and  velocity  is  described  by  the  epi polar  plane  images 
(EPIs)  created  by  Bolles  and  Baker  [8]  for  the  case  of  a  camera  moving  (perpendicularly  to  the 
direction  of  motion)  in  a  static  environment.  There,  at  a  given  EPI,  we  are  able  to  track  the  temporal 
evolution  of  each  image  element  (at  a  fixed  height),  and  this  is  described  by  a  straight  line. 
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Once  we  know  that,  for  uniform  translation,  the  space-time  evolution  of  image  elements  .s  given 
by  straight  lines,  in  order  to  select  a  specific  velocity  (optical  how),  we  can  use  (space-time)  oriented 
filters  [6.7].  A  particularly  important  aspect  of  this  analysis  comes  from  the  fact  that,  for  an  uniformly 
translating  pattern,  the  support  of  the  contrast  function  in  the  frequency  domain  is  given  by  a  plane 
(or  a  line  far  the  case  of  one-dimensional  motion)  [7,10]  passing  through  the  origin  of  the  coordinate 
system.  La  terms  of  space-time  filtering,  this  means  that,  in  order  to  select  a  specific  velocity  of  an 
uniformly  translating  pattern,  we  have  to  tune  the  filter  to  the  orientation  in  the  frequency  domain 
which  gives  the  highest  response. 

The  use  of  directionally  selective  (space-time)  filters  pose  a  limitation  in  the  sense  that  they  are 
phase  sensitive  [6].  This  means  that,  depending  on  the  alignment  between  the  space-time  configuration 
of  moving  patterns  and  the  filter  shape,  we  can  get  different  results:  the  filtered  output  may  oscillate 
or  vary  between  positive  and  negative  values.  A  solution  to  this  problem  is  given  by  computing  the 
energy  (power  spectrum)  of  the  filter  output  The  energy  of  a  convolved  signal  is  independent  of  any 
phase  problem,  and  for  the  case  of  an  uniformly  translating  pattern  its  output  is  constant. 

If.  for  example,  we  compute  the  energy  associated  to  a  space-time  Gabor  filter  [5]  convolved  with 
an  arbitrary  function,  then  the  final  result  will  not  oscilate  or  depend  on  any  phase  factor.  This  leads 
to  the  concept  of  space-time  oriented  filters  as  energy  filters,  which,  with  the  assumption  of  random 
textured  images  and  Parseval's  theorem  made  it  possible  for  Heeger  [5]  to,  analytically,  predict  the 
energy  associated  to  a  particular  space-time  oriented  partem. 


23  Space-time  Difference-of-Gaussian  (DOG)  cascade  as  an  energy  filter 


Space-time  filtering,  either  through  energy  filters  or  cascades,  is  primarily  concerned  with  the  pro¬ 
cessing  of  a  temporal  sequence  of  images,  such  that  the  interval  between  successive  images  is  small. 
On  one  hand,  the  work  of  Heeger  [S]  showed  us  that  it  is  possible  to  obtain  a  dense  image  flow 
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map  by  using  s  col  lectin"  of  twelve  space-time  Gabor  filters,  each  tuned  to  a  different  direction  in 
space-time.  The  space-time  Gabnr  filter  is  parametrized  by  three  (gaussian)  filter  sizes  (<rz,  oy  and 
at),  in  aHrfhirm  to  the  (three)  sine  or  cosine  space-time  frequencies  (whose  relative  ratios  correspond 
to  different  orientations  in  space-time).  It  would  be  desirable  to  have  a  broader  space-time  tuning 
capability,  as,  for  example,  in  the  case  of  the  cascaded  filters  proposed  by  Fleet  and  Jepsoa  [9].  They 
proposed  the  construction  of  space-time  oriented  filters  in  terms  of  cascades  of  the  CS  filter.  The  CS 
filter  is  defined  as  the  difference  of  spatial  gaussians  which  are  each  multiplied  by  a  temporal  expo¬ 
nentially  decaying  function,  corresponding  to  a  temporal  center  (Q-surround  (S)  model,  in  analogy  to 
biological  systems,  plus,  a  temporal  delay  term  emboddied  in  the  S  part  The  space-time  orientation 
Is  obtained  by  convolving  the  CS  filter  with  a  sum  of  (space-time)  Dirac  distributions,  each  centered 
at  a  specific  location  in  space-time  so  that  the  result  is  a  oriented  pattern.  The  use  of  layered  cascades 
of  the  CS  filter  improves  the  orientation  specificity  of  the  filters,  as  shown  by  Fleet  and  Jepson  [9]. 
In  respect  to  its  tuning  capabilities,  these  layered  cascades  of  the  CS  filter,  are  able,  in  addition  to 
their  specific  orientation,  to  select  features  moving  at  high  or  low  speed  by  adjusting  the  ratio  of  the 
spatial  or  temporal  filter  sizes  to  one,  respectively. 

We  would  like  to  use  a  filter  which  exibits  a  wide  range  of  space-time  tuning  and  can  also  be  used 
to  extract  the  image  flow  as  an  energy  filter.  The  simplest  fusion  of  these  two  aspects  is  exibited  by 
the  space-time  DOG  filter,  used  in  cascade.  In  fact,  if  we  substitute  the  temporal  exponential  decay 
term  in  the  CS  filter  by  a  temporal  gaussian,  and  eliminate  the  temporal  delay,  we  get  a  space-time 
DOG.  The  number  of  parameters  of  this  filter  is  equ.1  9,  where  4  correspond  to  the  cotter  and 
surround  filter  sizes  (the  spatial  filter  sizes  are  assumed  to  be  equal),  spatial  and  temporal  offsets 
make  up  3  parameters,  plus  the  center  and  surround  multiplicative  constants.  The  only  reason  for 
not  using  the  CS  cascade  filter  of  Fleet  and  Jepson  directly  as  an  energy  filter  comes  from  the  fact 
that  the  energy  expression  turns  out  to  be  more  complex  than  that  of  the  space-time  DOG  cascade 
because  it  has  a  linear  temporal  exponential  decay,  whereas  for  the  DOG  filter  the  temporal  decay  is 
gaussian,  thus  making  it  easier  to  perform  the  temporal  integral  in  order  to  get  the  energy  expression. 

We  should  remind  oumelves  that  the  space -time  DOG  and  Gabor  filters  are  non -causal,  as  a 
consequence  of  the  Paley- Wiener  theorem  [12]  which  states  that,  if  a  temporal  filter /(r)  has  a  square 
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imegrable  Fourier  transform  / (w)  and  satisfies  the  relation 


(11) 


then  fit)  is  causal.  The  CS  filter,  on  the  contrary,  due  to  its  linear  exponential  temporal  decay,  is  a 
causal  filter. 


The  space-time  DOG  filter  is  given  by  the  following  expression 

D(x,y,t)  =  Aczxpi-ix1  +  yl)l(2a2c)  —  P/(2  £)) 

-  A,exp{-{f  +  y1)/(2cr2, )  -  Pliln])),  (2.2) 

where  <jc  (<r»)  and  nc  (n,)  are  the  center  (surround)  spatial  and  temporal  filter  sizes  respectively,  while 
Ae  and  A,  are  adjustable  parameters  (used  in  the  discrete  version  of  the  filter  to  tune  the  sum  of  all 
elements  of  the  mask  to  zero).  Its  Fourier  transform  is  given  by 

Dik,w)  *  Acexp(-  (P <r*/2  +  w2 n\j2)) 

-  Aeexp(-(l?<T*/2  +  wzri/2)),  (2.3) 

where  the  spatial  and  temporal  frequencies  are  respectively  given  by  k(k  =  {kx,  ky))  and  w. 

A  cascade  of  filters  corresponds  to  applying,  in  sucession.  a  set  of  linear  filtets.  to  a  collection  of 
signals  [9],  suJt  that  the  interval  between  their  sucessive  positions  of  highest  magnitude  is  measured 
by  the  offset.  In  the  case  of  space-time  filtering  these  offsets  have  a  spatial  as  well  as  a  temporal 
part.  Also,  they  can  occur  in  a  set  of  layers,  where  each  layer  corresponds  to  a  different  collection 
of  space-time  offsets. 

Let  us  define,  analogously  to  Fleet  and  Jepson  [9],  the  one-layer  cascade  by  the  expression 

C(x,y,r)  =  D(x,y,  t)  *  £(x,y,r),  (2.4) 

where 

E(x,y,t)  =  ^(x,y,r)  +  ^(x  +  £*,y +  £y,r+  r)  +  ^tf(x-G,y-  £y,t-  r). 


(2.51 
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£(•)  is  the  Dirac  delta  (distribution),  D(x,y,t)  the  space-time  DOG  as  given  by  formula  (2.2)  and  * 
the  convolution  operation.  In  the  frequency  domain  this  cascade  is  given  by 

C(iw)  =  £>(k,w)E&w)  (2.6) 

where 

£(£w)  *  \  +  \[exp(i(k-(  +  wr))  +  expi-i(k£  +  wr))] 

2  4 

S  1  +  cos(*£+ wr)],  (2.7) 

and  £>(k,  w)  is  given  by  (2.3). 

By  increasing  (decreasing)  the  offset  values  of,  for  example,  the  one-layer  cascade  we  get  more 
(less)  specificity  to  velocity.  This  can  be  observed  by  comparing  Figures  1  and  2,  or  their  respective 
Fourier  transform.  Figures  3  and  4.  We  fix  Uc  =  1.0,  U,  =  3.0,  Me  =  1.0,  M,  =  3.0,  Ac  =  1.0  and 
At  =  1.0. 


For  Figure  1  we  have  =  0.77,  £ y  =  0  and  r  =  2.89,  whereas  for  Figure  2,  £*  =  0.52.  £y  =  0 
and  r  =  1.93,  which  correspond  to  a  slope  of  15  deg  in  the  x-t  plane  (or  0.26  pixels  per  frame).  If 
we  inspect  Figures  3  and  4  it  becomes  dear  that  for  larger  offsets  (Figure  3)  we  get  more  tuning  to 
velocity,  although  more  ringing  [9]  (due  to  aliasing  of  adjacent  patterns),  for  small  velodties.  A  way 
by  which  we  get  less  ringing  and  more  velocity  specificity,  as  described  by  Fleet  and  Jepson  [9],  is 
to  build  cascades  out  of  more  than  one  layer.  For  example,  a  two  layered  cascade  is  constructed  by 
convolving  two  one-layer  cascades,  each  with  a  different  collection  of  offset  values,  that  is 


C(x,  y,t)  =  Ci  ( x ,  y,  t)  *  C2(x,  y ,  r) ,  (2.8) 

where 

Ci(x,y,r)  =  D(x,y,t )  *  E\(x,y,t),  (2.9) 

C2(x,y,r)  =  D(x,y,t)  *  £2(x,y,/),  (2.10) 

£i(x,y,r)  =  ^<5(x,y, t)  +  ^<5(x  +  £,y  +  ^,/+  r1)  +  ^Hx- fi,y-  r1),  (2.11) 

£2 (x,y,t)  =  i<5(x,y,r)+  ^(x  +  g,y  +  $,:+  r2)  +  g,y-  g,t-  r2).  (2.12) 


and 
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For  this  two-layer  cascade.  Figure  5,  and  its  Fourier  transform  Figure  6,  we  can  observe  (we 
use  the  same  space-time  scale  as  for  the  one-layer  cascade  Figures)  that  there  occurs  much  less 
ringing  (Figure  6)  and  its  space-time  shape  exibits  a  broader  (also  narrower),  support  at  the  particular 
orientation  for  which  it  is  tuned. 

In  general,  irrespective  of  the  set  of  parameters  that  we  choose  for  the  filter,  there  always  exits  a 
specific  amount  of  directional  uncertainty  which  is  a  consequence  of  the  fact  that  the  filter  response  is 
not  perfectly  tuned  to  a  particular  orientation.  This  is  a  consequence  of  the  fact  that,  in  addition  to  the 
response  of  the  filter  to  the  particular  orientation  for  which  is  tuned,  there  exists  a  non-zero  respense 
to  a  restricted  range  of  orientations  in  its  neighborhood.  For  example,  in  the  case  of  one-dimensional 
motion  (parallel  to  the  x  axis)  of  a  given  image  partem,  in  order  to  filter  the  specific  direction  (in  the 
x-t  plane  or,  equivalently,  in  the  frequency  domain)  associated  to  its  velocity,  we  should  use  a  filter 
which  exibits  its  support  at  a  given  orientation  and  is  zero  otherwise.  In  practice,  we  will  only  be 
able  to  select  a  given  orientation  inside  a  cone,  such  that  its  aperture  is  proportional  to  the  motion 
uncertainty.  This  is  a  consequence  of  the  fan  that  any  (real)  filter  will  not  only  select  the  particular 
direction  for  which  it  was  designed,  but  also  adjacent  directions  inside  a  fixed  aperture.  As  a  result 
of  this,  there  will  always  result  a  motion  uncertainty,  and  consequently,  this  will  affect  (space-time) 
the  sampling  properties  of  the  filter.  This  issue  will  be  discussed  in  detail  in  the  next  section. 

The  energy  (power  spectrum)  associated  to  the  one-layer  cascade  is  given  by 

r  di  r  dw|eOt,w)|2 

J — OO  J — oo 

-  f  if  dw{  w)[  1  +  2cos(£-  (  +  wr)  +  cos2(£-  (  +  wt  )]}  .  (2.13) 

J — OO  J — oo 


Since  we  assume  only  translational  motion,  in  which  case  it  bolds  that 

w  =  k-  v  (2.U 

where  v  is  the  velocity  field,  we  can  rewrite  the  previous  energy  expression  in  the  following  form 
f0  dk\C(lk  v)\2 

J  —  OO 

=  f  dk{£>2(k.k- v)[  \  +  [exp(ik- ((  +  vt))  +  exp(-ik- (f  +  vr))] 

J  —oo 
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+  \[ap(2ik(t  +  9r))+  exp(-2ik((  +  9r))]]}.  (2.15) 

4 

As  a  next  step  we  want  to  develop  the  expression  of  the  energy  (2.  IS),  by  performing  the  integral 
over  l  In  this  respect,  it  is  useful  to  notice  that  formula  (2.15)  contains  the  hallowing  algebraic 
expression 

/(f,r,o,9)  *  f  dk{&(k,k-  7)[ap(iak- (f  +  ft-))  +  api-iak  ((  +  for ))]}  ,(2.16) 

oo 

where  a  can  be  any  positive  integer  number.  If  we  use  the  definition  of  £(£,  w)  and  the  constraint 
(2.14),  then 

=  A]  exp{-  <£(of  +  )  -  *^(<x?  +  )  —  2kxk,vxvy(il ) 

+A]ap(-  k*(o*  +  +  _  2 kjcjvxvjn)) 

-2 A'A'Otpi-kiiiol  +  oj  +  v*( +  a*i  ))  -  ^(<^  +  (x?  +  v£(/r?  +  M2 ))) 
x  etp(- 2kxkyVxvy( Me  +  M,))-  (21 7) 


Now,  by  inserting  (2.17)  into  expression  (2.16),  we  get 

/(£  r,o,v)  =  Fi(o,  £  ?,  r)  +  F2(o,  £  ?,  r)  +  F3(o,  £  v,  r) , 


(2.18) 


where 


Fi(a,f,v,r)  =  Azc  f  dk{[ap(iak- ((  +  vr))  +  ap(-iak- (( +  vt))] 

J  —  OO 

x  «p(-^(<rj  +  v^x*)  -  +  v^x*)  -  2kzkyVsVyM2c ) } 

F2(a-^,v,r)  =  A*  f  dk{[ap(iak- (f  +  vr))  +  exp(-iak- (f  +  vr))] 

y— oo 

x  exp(-i£(<7?  +  v*/x*)  -  ^(<rf  +  v^xj)  -  2kxkyvxvyM2 ) } 


(2.19) 


(2.20) 


and 


F3(a,  v,  r)  = 


-  2 Ae A,  I  dk{  l  expiiak  (f  +  vr ))  +  exp{-iak  (f  +  vr ))] 

J  — OO 

X«p(-^(t7*  +  <Tj  +  V^(/x^  +  £))  _  *2(<r?  +  +  ^(^2  +  /Ij  ) ) ) 

x  aqp{-  2kskyvxvy(  m\  +  m)  ) ) }  •  (2-21) 
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Finally,  if  we  uae  the  (gaussian)  integral  formula 


j  dkap(i(<Px<fiy)  ^  **  j  +  -|( kxky)A~l  |  ^  j 


[4T2  detA]i 


), 


(2.22) 


for  an  arbitrary  (2  x  2)  matrix  A,  then  F\  will  be  given  by 

Fi(o,f,v,  r)  *  ^=[(<r?  +  -  (W?)2r* 

xexp(-  +  mHXo?  +  ^v2)  -  (vxv,/i2)2r‘ 

x(dc  +  vxr)({y  +  vyT )) 


(T2  +  /l^V2  -VxVyM? 
-v*v,02  <r2  +  /^v2 


K2.23) 


Analogously,  Fj  and  Fj  are  given  by  a  similar  expression,  if  we  substitute  <r2.  u\  by  or},  /jj  and 
-t2  +  trf.fif  +  fij,  respectively. 


The  complete  expression  for  the  energy  is  given  by 

r  dk\C&k-V)\2 

J—oo 

-  +  ^X<^  +  /*«>£)  -  (VrVyMc)2]-^  [j 

+  exp{-  ^[(<r2  +  #t^X<^  +  u2^)  -  (vxVyM?)2r' 

(<72  +  -vxv«n2 

7  ,  7, 

“VlVyMc  °C  +  ^ 

+  ^«P(-[(<T2  +  ^X<7?  +  -(V,VyM2)2r' 

X((£x  +  vxr)(fy  +  VyT)) 


/  »?  +  Mfv2 

2  ^ 
-VlVy  Me 

f  Cx  +  vxr  ^ 

V  -v*vw4 

<72  +  „?V*  , 

V  &  +  v>r  > 

+  A2  ( ( <r2  +  mJv2  X <r2  +  tfv2 )  -  (  vxvyn]  )2rhl 

+  etp<-  i  { ( <r2  +  jtjv2  X  <y)  +  /ijv2  )  -  (  vxVy/i2  )2]" 1 


) 


)] 
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x<(&  +  v,r)(4,  +  vyT )) 


x((6  +  VxT)(t,  +  v,r )) 


+  fit 

~V*V>M2  J  f  €*  +  v,r 

“V*V,/l2 

<y2  +  M2y2  J  y^y  +  VyT 

djv2)  -  (VjVyM?)2]"1 

+  M2v* 

-v*v,m5  j  |  ix  +  VXT 

1 

■ft 

<T?  +  (fy*  J  l  f,  +  VyT 

>1 


6  +  vxr 


-  2AtAt[(.<Te  *°2,  +  (m?  +  +  <r?  +  (0*  +  -  (vxv,(p2  +  ri?))2H  [  | 

+  exp(-l[(*2  +  cr2  +  Cm2  +  M?)vjix*?  +  *?  +  (M?  +  tij)v£)  -(vxv,(/x2  +  #x2))2]-1 
x((£*  +  vxr  )(£y  +  v,r)) 

<r2  +  or?  +  (/r*  +  /ijjv2  -v,v,(/i?  + /ij) 

-VxVyOi?  +  /ij)  ff*  +  <r?  +  (m?  +  M2  )v^  )  \  +  ) 

^exp(-[(cr*  +  o2  +  (/i2  +  M^X^  +  +  U2  +  M?)^)  -(vxv,(jt2  +  ^*2 ))2]~* 

x((6  +  v,r)({,  +  VyT )) 

a\  +  a2,  +  (m?  +  n])^  -vxvy(n2c  +  ri) 

-vxvy(/x2  +^J)  <,2  +  *2  +(m2  +  ^2)^ 


(2.24) 


If  we  want  to  extract  the  optical  Sow  field  by  using  a  set  of  energy  filters,  each  tuned  to  a  different 
orientation  in  space-time,  then  we  are  confronted  with  another  source  of  motion  uncertainty.  This 
comes  horn  the  fact  that  we  have  to  determine  the  optical  Sow  field,  given  the  output  of  a  number 
of  energy  filters  (with  different  orientations).  For  example,  in  Heegcr’s  approach  the  estimated  field 
(vx,  vy)  minimizes  a  cost  function,  which  consists  in  the  sum  of  the  difference  between  the  measured 
(motion)  energy  and  its  predicted  value,  over  all  twelve  filters.  This  means  that  there  will  always 
exist  a  non-zero  contribution  from  filters  which  do  not  correspond  to  the  right  orientation,  due  to  an 
overlap  in  the  shape  of  neighboring  filters.  If  we  wish  to  reduce  the  uncertainty  in  the  motion  estimate 
because  of  neighboring  interaction  among  filters,  we  have  to  enhance  the  orientation  specificity  of 
each  filter  (thus  leading  to  less  lateral  overlap).  But  this  has  the  consequence  that,  for  a  fixed  number 
of  filters,  some  orientation  (mainly  corresponding  to  the  orientations  between  that  of  neighboring 
filters)  will  not  be  able  to  be  selected  any  more.  So  we  are  faced  with  a  trade-off  between  being 
able  to  select  a  specific  orientation  in  space-time,  with  a  minimum  of  uncertainty,  and  the  minimum 
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numbers  of  filters  necessary  to  span  all  orientations. 

The  method  of  optical  tow  extraction  used  by  Heeger  [51,  although  it  is  able  to  determine  the 
optical  tow  for  a  collection  of  different  types  of  moving  patterns,  contains  some  limitations  which 
should  be  mentioned,  that  is: 

1.  It  assumes  that  all  images  can  be  modeled  as  (locally)  random  patterns 

2.  In  order  to  be  able  to  use  Parseval’s  theorem,  it  is  necessary  to  approximate  the  expression  of 
the  energy 

3.  The  optimization  procedure,  which  has  to  be  performed  at  each  image  pixel,  is  computationally 
very  expensive. 

In  particular,  the  issue  of  approximating  the  integral  in  Parseval’s  theorem,  leads  to  errors  in  the 
estimated  value  of  the  optical  tow  in  regions  where  it  is  discontinuous,  thus  making  it  difficult  to 
use  the  estimated  value  as  input  for  the  operation  of  region  segmentation.  This  and  other  questions 
will  be  discussed  in  another  paper  [11]. 


3  Space-time  sampling 

In  the  previous  section  I  discussed  the  question  of  extracting  the  optical  flow  by  using  space-time 
filters,  considered  as  energy  filters.  Also,  I  proposed  the  use  of  cascades  of  space-time  filteis  like  the 
ones  constructed  by  Fleet  and  Jepson  as  energy  filters,  which  can  be  accomplished  by  substituting  the 
temporal  exponential  by  a  gaussian.  A  consequence  of  adopting  a  filtering  approach  to  the  extraction 
of  the  optical  flow  is  the  fact  that  it  is  necessary  to  sample  the  filter,  or  more  specifically,  to  perform 
a  space-time  sampling  of  the  filter.  The  temporal  sampling  issue  is  very  clearly  determined  by  the 
fact  that  the  temporal  interval  between  sucessive  images  used  in  space-time  filtering,  although  small. 
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is  finite.  A  question  which  is  naturally  raised  in  this  context  is  in  respect  to  how  much  can  we 
(temporally)  undersample  the  filter,  or  in  a  more  complete  statement,  the  convolution  of  the  image 
with  the  space-time  filter,  so  that  we  are  still  able  to  reconstruct  the  original  signal  For 
uniformly  translating  patterns,  the  spatial  and  temporal  sampling  ratios  are  not  independent,  and.  as 
it  is  shown  n*^t.  if  there  exists  a  certain  «mmmt  of  motion  uncertainty,  then  there  exists  a  maximum 
««mpiing  interval,  in  either  space  or  time,  such  that  aliasing  does  not  occur.  This  maximum  sampling 
interval  is  shown  to  be  a  (non-linear)  junction  of  tbe  motion  uncertainty. 

Initially,  I  will  describe  very  succintly,  for  one-dimensional  functions,  the  sampling  theorem  and 
p-nmVrrr  it  to  thnf  ~itimm<inrn  (two  spatial  and  one  temporal).  Next,  I  show  that  for  an  uniformly 
translating  partem  it  is  only  necessary  to  sample  in  either  the  spatial  or  temporal  variables.  Finally, 
I  relate  motion  uncertainty  with  the  maximum  sampling  interval  such  that  there  is  no  aliasing. 

The  sampling  theorem  [12]  gives  us  a  muthamatiral  formulation  for  the  reconstruction  of  a  con¬ 
tinuous  function  in  terms  of  a  coli<ytin«i  of  samples  of  this  function,  over  a  specific  domain.  If  we 
deal  with  real  signals,  on  tbe  other  hand,  there  is  always  a  certain  amount  of  under  or  oversampling 
depending  on  the  specific  architecture  of  the  filters  being  used.  In  particular,  for  the  case  of  undersam¬ 
pling  (where  the  spatial  or  temporal  sampling  rate  is  larger  than  the  one  established  by  the  sampling 
theorem  -  the  Nyquist  rate),  we  have  to  deal  with  the  aliasing  problem.  The  degree  of  aliasing  which 
is  permitted  (so  that  it  still  is  possible  to  reconstruct  the  original  function,  modulo  small  distortions) 
depends  not  only  on  the  filter  characteristics  but  also  on  the  type  of  data  being  filtered. 

Let  us  start  with  one-dimensional  signals,  represented  by  tbe  function  /Of).  We  obtain  a  sample 
of  /(x),/,(x),  by  multiplying  it  by  a  (infinite)  sum  of  (Dirac)  delta  distributions,  such  that  the  sample 
points  are  equidistant  (by  px).  Tbe  sample  function  f,(x)  is  given  by 

Mx)  -  f{x )  £(x,px),  (3.1) 

where 

£(x,px)  =  ]T  S(x  -  rixPx).  (3.2) 

oo 
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la  the  frequency  domain,  (3.1)  is  represented  by  the  convolution 

/,(*.)  */(*,)♦  £(*.,*),  (3.3) 

with 

£(**,*)=-  £  Hki  -  -)■  (3.4) 

0*  oo  * 

If  we  assume  that  /  (Ax)  is  band-limited  (J(kx)  is  zero  for  j*x|  >  Lx),  then  it  is  easy  to  check 
that,  unless  px  <  there  will  exist  a  region  where  the  adjacent  lobes  overlap,  which  is  a  signal  of 
undersampling,  and  as  a  consequence  of  this  we  have  the  aliasing  phenomenon.  In  order  to  avoid  this 
from  happening,  we  multiply  formula  (3.3)  by  a  function  H(kx),  as  for  example  the  ideal  low-pass 
filter  (which  is  1  for  |*z|  <  Lx,  and  0  otherwise),  as  a  result  of  which  (3.3)  reduces  to/ (*x).  This  has 
the  consequence  that  fix)  can  be  exactly  recovered  from  its  samples.  We  can  synthesize  this  result, 
by  stating  that,  if/  (*x)  is  band-limited  and  has  no  singularities  at  its  extremeties  (kx  =  ±Lj),  then 

/<*)=  £  fi—tsincOLrix  -  /£-)),  (3.5) 

where 

sin  wx 

swcix)  =  - ,  (3.6) 

TT  X 

which  is  a  version  of  the  sampling  theorem  [13]. 


We  can  generalize  the  sampling  theorem  to  three-dimensional  functions.  So,  given  that  fix,  y,  t) 
is  a  (space-time)  function  and  /  (kx,  ky,  w)  its  Fourier  transform  (Jex,  ky  and  w  are  the  Fourier  variables 
associated  to  x,  y  and  r),  /  is  zero  for  |AX|  >  Lx,  |A,j  >  L,  and  |w|  >  L,  and  it  does  not  have 
singularities  at  |AX]  =  Lx,  |Ay|  =  L,  and  |w|  =  L,,  then,  by  the  sampling  theorem 


fix,y,t)  =  £  5Z  /(^-^l-^t-)^(2Lx(x  -  ■£-)) 

voo  *,=-oc  v=-oo  4*-* 

x  sinc(2Ly(y  -  ^-))sinc(2L,(t  -  -£-)). 


The  case  of  translational  motion  [14,15],  in  which  case  it  holds  that 


/(Ax,Aj,,w)  =  / (kx, ky)6(>v  -  k-v), 


(3.8) 
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(which  is  equivalent  to  say  that  w  is  different  from  zero  only  on  the  plane  determined  by  k  ■  v).  the 
sampling  theorem  reads 

OO  oo 

/(*  ~  -  v,r)  -  XX  /(rjr-,  ~j-)sinc{2Lt{x  -  vzr  -  ^-)) 

xsinc(2L,(y  -  Vyt  -  £-)).  (3.9) 

This  means  that  we  only  need  to  sample  /(x,  y),  at  the  Nyquist  rate,  in  terms  of  its  spatial  variables. 
Another  way  to  understand  this  issue  is  given  in  terms  of  a  fourier  analysis,  which,  as  a  matter  of 
simplicity,  we  apply  for  the  two-dimensional  case  (x-t  space).  We  know  that  for  pure  translation, 
because  of  formula  (3.8),  the  sampled  function  /,  (kx,  w)  (analogously  to  (3.3))  is  given  by 


ft (**. w)  *  f(kx)6(w  -  kxvx) 

°o  oo 

*  (  X]  2L*  kx  —  InJLt )  XX  ^(  w  —  2/i(L| ) ) , 


(3.10) 


which  can  be  icwriten  in  the  form 


ft  (kx,  w)  =*  XX  XX  /  d£  /  dwf(l£)6(w  -  kjjV*) 

x2LxS(kx  -  2nxLx)2LtS(w  -  w  -  2n,L, ) . 


(3.11) 


By  using  that 


in  the  integral  over  w\  and 


y  <&<5(x  -  a)6(x  -  b)  =  6(a  -  b), 


y  dxf(x)6(x  -  a)b{x  -  b)  =/(a)6(a  -  b), 


(3.12) 


(3.13) 


in  the  ^  integral,  we  have  that  (3.11)  results  in 

°°^  OO 

ft(kx,w)  =  XX  “  2nxLx)4LsL,  XX  -  “  2(nxvxLx  -  n,L, ))].(3.14) 


We  can  conclude  that,  if  we  start  by  assuming  that  fix—  vxt )  is  sampled  independently  in  its  spatial 
and  temporal  variables,  then,  due  to  the  constraint  of  uniform  translation,  we  are  led  to  conclude  that 
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we  only  need  to  sample  in  the  spatial  (temporal)  variable.  So,  we  can  simplify  equation  (3.10)  to  the 
following  form 

/,(*,, w)  «/(*,)*{w  -  fc,v,)  *  [  f;  2M(*x  -  2nxU)],  (3.15) 

—  OO 

or,  by  expanding  the  convolution  we  get 

/,(*.,*)  *  £  /(**  -  W«)2I,*(w  -  vx(kx- 2nJ*)),  (3.16) 

%*—  OO 

which  leads  us  to  the  two-dimensional  version  of  equation  (3.9).  The  expression  (3.16)  is  identical  to 
the  one  discribing  the  Burr’s  experiment  [15]  which  consists  in  sampling  in  space,  at  a  fixed  temporal 
interval,  a  pattern  watch  moves  at  constant  rate. 

For  illustration,  if  we  consider  a  (space-time)  band-limited  function  which  describes  an  uniformly 
translational  motion,  then,  by  the  constraint  (3.8)  its  support  (in  frequency  domain)  is  given  by  a  line 
segment,  as  it  is  shown  in  Figure  7.  Its  sampled  version,  satisfying  equation  (3.16)  with  Mz  =  2L,. 
consists  of  a  collection  of  replicas  of  the  original  line  segment,  which  are  uniformly  sampled  at 
intervals  of  Mx  (sec  Figure  8). 

From  this  we  can  deduce  that,  once  the  support  of  /  (kx)  is  defined  by  the  straight  lines  whose 
slope  is  given  by  v*.  its  sampling  rate  is  equal  to  the  spatial  sampling  rate  (or  equivalently  to  the 
temporal  sampling).  The  function  /(x,  r)  (which  is  identical  to  /(x+  vxt))  can  be  reconstructed  from 
f,  (**,  w)  by  applying  a  filter  which  has  a  support  parallel  to  the  line  w  =  k^yx,  and  more  than  this, 
as  it  is  shown  by  Crick  et  cU.  [14],  this  support  can  be  reduced  to  an  infinitesimally  narrow  strip, 
as  long  as  there  is  no  motion  uncertainty.  This  means  that,  for  the  case  of  translational  motion,  we 
can  increase  the  sampling  rate  px  as  much  as  we  wish,  given  that  we  are  able  to  exactly  measure 
the  velocity  v*.  On  the  other  hand,  if  we  deal  with  real  images,  there  is  always  a  certain  degree  of 
uncertainty  in  the  motion  measurement,  so  that  the  previous  considerations  do  not  hold.  This  leads 
us  to  the  issue  of  considering  the  sampling  theorem  in  the  presence  of  noisy  data  (thus  generating 
motion  uncertainty).  As  a  consequence  of  this,  we  have  to  know  in  what  way  the  sampling  theorem  (as 
previously  described)  has  to  be  modified  in  order  be  able  to  deal  with  motion  uncertainty.  Specifically, 
in  the  presence  of  motion  uncertainty,  it  is  no  longer  possible  to  arbitrarily  increase  the  sampling 
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interval,  without  getting  aliasing.  This  establishes  a  relationship  between  the  maximum  sampling 
interval  (in  space)  (or  minimal  in  the  frequency  domain)  and  motion  uncertainty. 

We  know  that  under  the  conditions  of  translational  motion  (let  consider  only  one-dimensional 
motion),  the  Fourier  transform  of  s  space-time  function  has  support  at  the  lines  passing  through  the 
origin,  and  whose  slope  is  proportional  to  the  velocity  of  the  moving  pattern.  If  we  introduce  a 
specific  degree  of  uncertainty  for  the  velocity,  then  this  support  will  be  given  by  a  (one-dimensional) 
cone,  whose  aperture  is  proportional  to  the  uncertainty  in  the  velocity  (See  Figures  9  and  10). 

Considering  the  case  of  a  band-limited  function  (with  finite  support  in  the  frequency  domain),  we 
can  use  polar  coordinates  to  describe  its  (two-dimensional)  variables. 

For  the  angular  variable  9  we  have  9  =  arctan  vz  and  the  radial  variable  r  is  the  maximum 
of  yjwt’  +  icj-.  The  motion  uncertainty  Avx  is  given  by  Avx  *  (tan(0  +  A6)  -  tan#),  where  A9 
corresponds  to  the  angular  aperture  of  the  cone,  centered  at  9.  Bar  small  values  of  AO,  69,  Avx  can 
be  approximated  to  6vx  =  sec2  969. 

If  we  sample  /(x  +  vxr)  along  the  x  direction  in  intervals  of  px  (or  Mx  in  the  frequency  domain) 
(Figure  10),  then  it  is  easy  to  show  that,  for  a  fixed  motion  uncertainty,  there  exists  a  minimum  value 
of  Mx,  AfT1".  such  that  the  adjacent  patterns  do  not  overlap. 

If  we  decrease  Mx  beyond  this  threshold,  aliasing  occurs.  This  establishes  a  relationship  between 
M7“*  and  Avx,  as  shown  by  the  following  theorem. 


Theorem:  If  we  have  a  band-limited  function  fix,  t)  describing  an  uniformly  translating 
pattern,  given  that  its  velocity  vx,  which  is  assumed  to  be  different  from  zero,  is  measured  within 
an  uncertainty  range  of  Avx,  then  there  exists  a  minimum  value  for  the  spatial  frequency  sampling 
interval  suclt  aliasing  occurs.  AfJ“"  is  related  to  Avx  by 

_  2rsm{A9 /2W \  +  tan2# 
z  tan0  +  taa(A9/2)  ' 


3  SPACE-TIME  SAMPLING 


19 


where 


Avx 

AO  a  aretan(  v,  +  -r— )  -  aictan(  vx  - 


r 


max 


yf*?  +  *§» 


tan  0  a  vx . 


Proof: 


We  can  observe,  from  Figure  11  (or  Figure  12),  that  there  exists  a  point  P,  in  the  (r,0)  plane, 
where  the  adjacent  patterns,  corresponding  to  replicas  of  a  (one-dimensional)  cone,  intersect  without 
overlapping.  This  point  is  the  solution  to  the  following  equations 

rcostfi  *  +  dcos02  (3.17) 

and 

rsinli  *  dsm&2,  (3.18) 

where  0i  =  9  -  AO/2  and  0j  =  0  +  AO/2. 

By  substituting  d,  given  by  (3.19),  into  (3.18)  we  get 

*  rsin0i(cot0i  -  cot02),  (3.19) 


or 


Mf"  =  rsin(0  -  AO/2)  [ 


1 


1 


tan(0  -  AO/2)  tan (0  +  AO/2) 


}■ 


(3.201 


Expanding  the  sine  and  tangent  in  (3.21)  we  get  the  following  expression 
MT  ■  risin0casA0/2  -  cos  0  sin  AO/2)  [  (3.21) 

which,  after  some  algebra  leads  to 

If* “  —  —  ;  '  ’  ~  ’  -  ~  ("5  -vrv 


tan0  -  vanAO/2  tan0  +  vanAO/2 

2  r  sin(.A0  /  2)V  1  +  tan2  0 


tan0  +  tan(z!0/2) 
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with  AB  »  9 2  -  9\,  9z  *  arctan(vz  +  )  and  B\  ■  arctan(  vz  -  ). 

This  concludes  the  proof. 

The  theorem  shows  us  that  the  minimum  interval,  in  the  frequency  domain,  between  adjacent 
sampling  points  (on  the  kx  axis)  is  bound,  nonlinear! y,  by  the  degree  of  motion  uncertainty.  Conse¬ 
quently,  the  spatial  sampling  rate  p*  cannot  be  arbitrarily  increased,  but  depends  on  the  amount  of 
motion  uncertainty.  Since  the  spatial  sampling  rate  px  is  the  inverse  of  Mx  (fix  -  j^).  and  Mx  is 
bound,  by  motion  uncertainty,  to  a  minimum  value  px  has  a  maximum  value  equal  to  pj“*.  If 
Px  >  we  have  aliasing  of  adjacent  patterns  (cones). 


4  Conclusion 


The  extraction  of  optic  flow,  via  space-time  filtering,  is  given  in  terms  of  a  collection  of  filters  which 
are  tuned  to  different  orientations  in  space-time.  The  space-time  Gabor  and  Cascades  of  the  CS  or 
DOG  filters  are  specially  suited  for  this  task  because  they  constitute  (space-time)  oriented  filters.  I 
show  that  it  is  possible,  in  particular,  to  use  the  cascaded  filter  approach  of  Fleet  and  Jepson  [9]  as  an 
energy  filter,  given  that  the  exponential  temporal  part  of  the  CS  filter  is  substituted  by  a  (temporal) 
gaussian. 

The  space-time  filtering  approach  to  the  extraction  of  optical  flow  is  implemented  on  a  sequence 
of  images  which  are  closely  displaced  in  time.  The  temporal  interval  between  sucessive  images  in  this 
sequence  corresponds  to  the  (temporal)  sampling  rate,  which  as  we  saw  before,  is  not  independent  of 
the  spatial  sampling  rate.  In  general,  we  warn  to  use  the  sequence  of  images  in  such  a  way  that  we 
are  still  able  to  extract  the  optical  flow,  but  using  the  minimum  number  of  images.  This  means  that 
we  have  to  increase  the  temporal  sampling  ratio  as  much  as  possible,  without  getting  any  aliasing 
effect.  As  a  consequence  of  this,  we  have  to  ask  ourselves  what  is  the  upper  limit  for  the  temporal 
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(spatial)  sampling  rate  such  that: 

1.  We  soil  ate  able  to  use  a  filtering  approach  to  extract  the  optical  Sow 

2.  We  do  not  get  any  aliasing  effect 

As  shown  by  the  theorem  of  the  previous  section,  for  uniformly  translating  patterns,  the  maximum 
spatial  sampling  interval  is  determined  by  the  degree  of  motion  uncertainty.  The  same  can  be  shown 
for  the  case  of  temporal  sampling.  If  we  sample  at  a  lower  rate  than  the  minimum  amount  established 
by  the  theorem  of  section  3,  we  get  aliasing.  This  answers  the  second  part  of  the  question. 

The  first  part  of  the  question  is  more  difficult  to  be  answered.  Just  as  an  illustration,  we  can 
mention  a  problem  which  bean  similarities  to  the  use  of  a  filtering  or  feature  matching  approaches  to 
the  extraction  of  optical  flow.  It  is  the  hypothesis  of  the  existence  of  two,  distinct,  processes  to  detea 
or  extract  optic  flow  in  humans  [18,19],  called  short  and  long  range  processes.  They  are  studied,  in 
psychophysics,  as  a  phenomenon  of  apparent  motion,  which  is  the  capability  of  the  human  visual 
system  to  be  able  to  interpolate  the  (spatial)  position  of  moving  objects  between  discrete  presentations 
of  sucessive  snapshots  of  the  motion.  We  can  establish  a  general  relationship  between  short-range 
and  the  filtering  approach  to  optical  flow,  and  between  long-range  and  the  feature  matching  approach. 
The  short-range  process  operates  in  short  temporal  intervals  (between  sucessive  frames  -  also  called 
inter-stimulus  interval  ISI,  ranging  from  SO  and  100ms)  and  angular  intervals  of  IS'  or  less.  The 
long-range  process,  on  the  other  hand,  can  take  place  even  for  ISI  as  long  as  400  ms  [1  ],  and  it  works 
mainly  through  the  matching  of  features  (edges,  blobs,  etc.),  thus  operating  through  the  identification 
of  elements  in  sucessive  frames. 

If  short  and  long-range  processes  in  human*  are  really  independent  and  operate  through  different 
mechanisms,  it  can  point  out  to  the  possibility  that  if  the  filtering  and  feature  matching  approaches 
should  bear  some  resemblance  with  them,  then  there  should  exist  a  definite  borderline  between  both 
approaches.  In  this  sense  we  can  say  that  (space-time)  aliasing  is  one  criteria  by  which  we  can  decide 
upon  this  problem. 
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Figure  5  Two-layer  cascade  of  space-time  DOG  filter  with  =  0.52,  £*  =  0,  r1  = 
£  =  1.04,  =  0.  r2  =  3.86. 


1.93. 


Figure  8  The  sampled  version  of  the  support  of  an  uniformly  translating  pattern,  as  represented 
by  Figure  7.  The  sampling  interval  is  equal  Mx. 


Figure  U  Tie  suppled  vemion  of  Ftgum  9  iu  the  case  wbete  dm  sampling  rae  is  Slldl 
Ibe  adjacent  cones  touch  each  odter,  but  without  overlapping.  The  sampling  tare  M,  *  M?"  ,s  Lbc 
mimimai  one  such  that  there  doesn’t  occur  aliasing. 


diagram  showing  the  nelavant  parameters  involved  in  the  proof  of  the  theorem 


